1,625 research outputs found
Online Importance Weight Aware Updates
An importance weight quantifies the relative importance of one example over
another, coming up in applications of boosting, asymmetric classification
costs, reductions, and active learning. The standard approach for dealing with
importance weights in gradient descent is via multiplication of the gradient.
We first demonstrate the problems of this approach when importance weights are
large, and argue in favor of more sophisticated ways for dealing with them. We
then develop an approach which enjoys an invariance property: that updating
twice with importance weight is equivalent to updating once with importance
weight . For many important losses this has a closed form update which
satisfies standard regret guarantees when all examples have . We also
briefly discuss two other reasonable approaches for handling large importance
weights. Empirically, these approaches yield substantially superior prediction
with similar computational performance while reducing the sensitivity of the
algorithm to the exact setting of the learning rate. We apply these to online
active learning yielding an extraordinarily fast active learning algorithm that
works even in the presence of adversarial noise
Slow Learners are Fast
Online learning algorithms have impressive convergence properties when it
comes to risk minimization and convex games on very large problems. However,
they are inherently sequential in their design which prevents them from taking
advantage of modern multi-core architectures. In this paper we prove that
online learning with delayed updates converges well, thereby facilitating
parallel online learning.Comment: Extended version of conference paper - NIPS 200
A Contextual Bandit Bake-off
Contextual bandit algorithms are essential for solving many real-world
interactive machine learning problems. Despite multiple recent successes on
statistically and computationally efficient methods, the practical behavior of
these algorithms is still poorly understood. We leverage the availability of
large numbers of supervised learning datasets to empirically evaluate
contextual bandit algorithms, focusing on practical methods that learn by
relying on optimization oracles from supervised learning. We find that a recent
method (Foster et al., 2018) using optimism under uncertainty works the best
overall. A surprisingly close second is a simple greedy baseline that only
explores implicitly through the diversity of contexts, followed by a variant of
Online Cover (Agarwal et al., 2014) which tends to be more conservative but
robust to problem specification by design. Along the way, we also evaluate
various components of contextual bandit algorithm design such as loss
estimators. Overall, this is a thorough study and review of contextual bandit
methodology
- …